{ "cells": [ { "cell_type": "markdown", "source": [ "# Pandas Dataframe Exercises\n", "## Try me\n", "[](https://colab.research.google.com/github/ffraile/computer_science_tutorials/blob/main/source/Data%20Manipulation/exercises/Pandas%20Dataframes.ipynb)[](https://mybinder.org/v2/gh/ffraile/computer_science_tutorials/main?labpath=source%2FData%20Manipulation%2Fexercises%2FPandas%20Dataframes.ipynb)\n", "\n", "In the first exercises, we are going to use the open dataset from the National Institute of Diabetes and Digestive and Kidney Diseases which is available in Kaggle. The dataset contains information about patients with diabetes.\n", "You can find it in this URL:\n", "\n", "https://www.kaggle.com/uciml/pima-indians-diabetes-database\n", "\n", "We have downloaded the dataset and we have uploaded it to the repository of the course. You can find it in the following URL:\n", "\n", "https://raw.githubusercontent.com/ffraile/computer_science_tutorials/main/source/Data%20Manipulation/exercises/datasets/diabetes.csv'\n", "\n", "The dataset contains the following columns:\n", "\n", "* Pregnancies: Number of times pregnant\n", "* Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test\n", "* BloodPressure: Diastolic blood pressure (mm Hg)\n", "* SkinThickness: Triceps skin fold thickness (mm)\n", "* Insulin: 2-Hour serum insulin (mu U/ml)\n", "* BMI: Body mass index (weight in kg/(height in m)^2)\n", "* DiabetesPedigreeFunction: Diabetes pedigree function\n", "* Age: Age (years)\n", "* Outcome: Class variable (0 or 1)\n", "* 268 of 768 are 1, the others are 0\n", "* Class Distribution: (class value 1 is interpreted as \"tested positive for diabetes\")\n", "\n", "The following code loads the dataset into a Pandas dataframe:" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "source": [ "import pandas as pd\n", "diabetes_pd = pd.read_csv('https://raw.githubusercontent.com/ffraile/computer_science_tutorials/main/source/Data%20Manipulation/exercises/datasets/diabetes.csv')\n", "diabetes_pd" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" }, "ExecuteTime": { "end_time": "2025-12-25T18:44:29.062132Z", "start_time": "2025-12-25T18:44:28.488505Z" } }, "outputs": [ { "data": { "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", ".. ... ... ... ... ... ... \n", "763 10 101 76 48 180 32.9 \n", "764 2 122 70 27 0 36.8 \n", "765 5 121 72 23 112 26.2 \n", "766 1 126 60 0 0 30.1 \n", "767 1 93 70 31 0 30.4 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50 1 \n", "1 0.351 31 0 \n", "2 0.672 32 1 \n", "3 0.167 21 0 \n", "4 2.288 33 1 \n", ".. ... ... ... \n", "763 0.171 63 0 \n", "764 0.340 27 0 \n", "765 0.245 30 0 \n", "766 0.349 47 1 \n", "767 0.315 23 0 \n", "\n", "[768 rows x 9 columns]" ], "text/html": [ "
| \n", " | Pregnancies | \n", "Glucose | \n", "BloodPressure | \n", "SkinThickness | \n", "Insulin | \n", "BMI | \n", "DiabetesPedigreeFunction | \n", "Age | \n", "Outcome | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "6 | \n", "148 | \n", "72 | \n", "35 | \n", "0 | \n", "33.6 | \n", "0.627 | \n", "50 | \n", "1 | \n", "
| 1 | \n", "1 | \n", "85 | \n", "66 | \n", "29 | \n", "0 | \n", "26.6 | \n", "0.351 | \n", "31 | \n", "0 | \n", "
| 2 | \n", "8 | \n", "183 | \n", "64 | \n", "0 | \n", "0 | \n", "23.3 | \n", "0.672 | \n", "32 | \n", "1 | \n", "
| 3 | \n", "1 | \n", "89 | \n", "66 | \n", "23 | \n", "94 | \n", "28.1 | \n", "0.167 | \n", "21 | \n", "0 | \n", "
| 4 | \n", "0 | \n", "137 | \n", "40 | \n", "35 | \n", "168 | \n", "43.1 | \n", "2.288 | \n", "33 | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 763 | \n", "10 | \n", "101 | \n", "76 | \n", "48 | \n", "180 | \n", "32.9 | \n", "0.171 | \n", "63 | \n", "0 | \n", "
| 764 | \n", "2 | \n", "122 | \n", "70 | \n", "27 | \n", "0 | \n", "36.8 | \n", "0.340 | \n", "27 | \n", "0 | \n", "
| 765 | \n", "5 | \n", "121 | \n", "72 | \n", "23 | \n", "112 | \n", "26.2 | \n", "0.245 | \n", "30 | \n", "0 | \n", "
| 766 | \n", "1 | \n", "126 | \n", "60 | \n", "0 | \n", "0 | \n", "30.1 | \n", "0.349 | \n", "47 | \n", "1 | \n", "
| 767 | \n", "1 | \n", "93 | \n", "70 | \n", "31 | \n", "0 | \n", "30.4 | \n", "0.315 | \n", "23 | \n", "0 | \n", "
768 rows × 9 columns
\n", "| \n | Date | \nChina | \nUS | \nUnited_Kingdom | \nItaly | \nFrance | \nGermany | \nSpain | \nIran | \n
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n2020-01-22 | \n548 | \n1 | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n
| 1 | \n2020-01-23 | \n643 | \n1 | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n
| 2 | \n2020-01-24 | \n920 | \n2 | \n0 | \n0 | \n2 | \n0 | \n0 | \n0 | \n
| 3 | \n2020-01-25 | \n1406 | \n2 | \n0 | \n0 | \n3 | \n0 | \n0 | \n0 | \n
| 4 | \n2020-01-26 | \n2075 | \n5 | \n0 | \n0 | \n3 | \n0 | \n0 | \n0 | \n
| ... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n
| 728 | \n2022-01-19 | \n118370 | \n68684431 | \n15610069 | \n9219391 | \n15288014 | \n8361262 | \n8676916 | \n6231909 | \n
| 729 | \n2022-01-20 | \n118470 | \n69329860 | \n15718193 | \n9418256 | \n15715329 | \n8502132 | \n8834363 | \n6236567 | \n
| 730 | \n2022-01-21 | \n118544 | \n70209840 | \n15814617 | \n9603856 | \n16116748 | \n8635461 | \n8975458 | \n6241843 | \n
| 731 | \n2022-01-22 | \n118616 | \n70495874 | \n15891905 | \n9781191 | \n16506090 | \n8716804 | \n8975458 | \n6245346 | \n
| 732 | \n2022-01-23 | \n118773 | \n70699416 | \n15966838 | \n9923678 | \n16807733 | \n8773030 | \n8975458 | \n6250490 | \n
733 rows × 9 columns
\n